{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Python SDK to access the ZhipuAI GLM-4 Vision API\n",
    "\n",
    "**This tutorial is available in English and is attached below the Chinese explanation**\n",
    "\n",
    "此代码将讲述如何使用Python SDK 调用 GLM-4V API，来完成简单的视觉理解和分析工作。\n",
    "\n",
    "This cookbook will describe how to use the Python SDK to call the GLM-4V API to complete simple visual understanding and analysis work."
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "b9d2202d7ce04a48"
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"ZHIPUAI_API_KEY\"] = \"your api key\""
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-28T11:49:42.595265Z",
     "start_time": "2024-04-28T11:49:42.590710Z"
    }
   },
   "id": "f03a65879fc8edb8"
  },
  {
   "cell_type": "markdown",
   "source": [
    "首先，我们需要将图片转为可以上传的base64格式，这里我们使用PIL库来完成这个工作"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "524d71278f724da1"
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "outputs": [],
   "source": [
    "import base64\n",
    "import io\n",
    "from zhipuai import ZhipuAI\n",
    "from PIL import Image\n",
    "\n",
    "client = ZhipuAI()\n",
    "\n",
    "\n",
    "def image_to_base64(image_path):\n",
    "    \"\"\"\n",
    "    Convert an image to base64 encoding.\n",
    "    \"\"\"\n",
    "    with Image.open(image_path) as image:\n",
    "        buffered = io.BytesIO()\n",
    "        image.save(buffered, format=\"JPEG\")  # or format=\"PNG\", depending on your image.\n",
    "        img_str = base64.b64encode(buffered.getvalue()).decode()\n",
    "    return f\"data:image/jpeg;base64,{img_str}\"\n",
    "\n",
    "\n",
    "base64_image = image_to_base64(\"data/zR.jpg\")"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-28T11:49:42.886061Z",
     "start_time": "2024-04-28T11:49:42.596510Z"
    }
   },
   "id": "4f3135b451b85400"
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "outputs": [],
   "source": [
    "messages = [\n",
    "    {\n",
    "        \"role\": \"user\",\n",
    "        \"content\": [\n",
    "            {\n",
    "                \"type\": \"text\",\n",
    "                \"text\": \"what is this image describe?\"\n",
    "            },\n",
    "            {\n",
    "                \"type\": \"image_url\",\n",
    "                \"image_url\": {\n",
    "                    \"url\": base64_image\n",
    "                }\n",
    "\n",
    "            }\n",
    "        ]\n",
    "    }\n",
    "]"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-28T11:49:42.889116Z",
     "start_time": "2024-04-28T11:49:42.886885Z"
    }
   },
   "id": "7898ce8f4610930e"
  },
  {
   "cell_type": "markdown",
   "source": [
    "我们已经组织好了信息和图片，现在让我们按照[官方文档](https://open.bigmodel.cn/dev/api#glm-4v)的内容传入对应的参数并获取模型的回答\n",
    "\n",
    "We have organized the information and pictures, now let us follow the [official document](https://open.bigmodel.cn/dev/api#glm-4v) to pass in the corresponding parameters and get the model's answer"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "5637a50d4f7c7383"
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "temperature:0.7\n",
      "top_p:0.9\n",
      "temperature:0.7\n",
      "top_p:0.9\n"
     ]
    }
   ],
   "source": [
    "response = client.chat.completions.create(\n",
    "    model=\"glm-4v\",\n",
    "    messages=messages,\n",
    "    temperature=0.7,\n",
    "    top_p=0.9\n",
    ")"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-28T11:49:50.218979Z",
     "start_time": "2024-04-28T11:49:42.889930Z"
    }
   },
   "id": "b9ba237babf6fb35"
  },
  {
   "cell_type": "markdown",
   "source": [
    "通过这个操作，我们将能得到模型对这张图的描述。\n",
    "\n",
    "Through this operation, we will be able to get the model's description of this picture."
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "fd97d54e0506c728"
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "outputs": [
    {
     "data": {
      "text/plain": "Completion(model='glm-4v', created=1714304990, choices=[CompletionChoice(index=0, finish_reason='stop', message=CompletionMessage(content=\"The image depicts a stylized illustration of a figure, viewed from the side, set against a dark background. The figure is wearing a long, flowing dress that drapes elegantly to the floor, with the hem and sleeves showing gradients of color transitioning from a light tone at the top to a darker shade towards the bottom and edges. One hand of the figure is slightly raised, holding what appears to be a violin by its neck, indicating that the figure may be a musician or a representation of one. The violin is also depicted in a simplified form, aligning with the overall abstract and artistic nature of the illustration. The movement suggested by the figure's posture and the flow of the dress gives a sense of grace and poise. The use of negative space and simplicity in design creates an elegant and sophisticated mood.\", role='assistant', tool_calls=None))], request_id='8611192515567253794', id='8611192515567253794', usage=CompletionUsage(prompt_tokens=1674, completion_tokens=168, total_tokens=1842))"
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "response"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-28T11:49:50.225625Z",
     "start_time": "2024-04-28T11:49:50.221279Z"
    }
   },
   "id": "584b816346cec5b9"
  },
  {
   "cell_type": "markdown",
   "source": [
    "你还可以对这张图片进行更多的提问，并使用历史记录的方式保留之前的提问和回答。现在，我将为这一段对话添加一段新的历史\n",
    "\n",
    "You can also ask more questions about this picture and use historical records to retain previous questions and answers. Now I'm going to add a new history to your conversation"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "ebf92a68c1a51e8c"
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "outputs": [
    {
     "data": {
      "text/plain": "{'role': 'user',\n 'content': [{'type': 'text',\n   'text': 'what is the color of hair and dress the this women?'}]}"
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "messages += [\n",
    "    {\n",
    "        \"role\": \"user\",\n",
    "        \"content\": [\n",
    "            {\n",
    "                \"type\": \"text\",\n",
    "                \"text\": \"what is the color of the hair?\"\n",
    "            }\n",
    "        ]\n",
    "    },\n",
    "    {\n",
    "        \"role\": \"assistant\",\n",
    "        \"content\": \"It is pink\"\n",
    "    },\n",
    "    {\n",
    "        \"role\": \"user\",\n",
    "        \"content\": [\n",
    "            {\n",
    "                \"type\": \"text\",\n",
    "                \"text\": \"what is the color of hair and dress the this women?\"\n",
    "            }\n",
    "        ]\n",
    "    },\n",
    "]\n",
    "messages[-1]"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-28T11:49:50.230467Z",
     "start_time": "2024-04-28T11:49:50.226443Z"
    }
   },
   "id": "eb264e5c17d6eba7"
  },
  {
   "cell_type": "markdown",
   "source": [
    "现在，我们再次请求，看看模型返回的结果\n",
    "\n",
    "Now, we request again and see the results returned by the model"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "b9decda6959a9498"
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "temperature:0.7\n",
      "top_p:0.9\n",
      "temperature:0.7\n",
      "top_p:0.9\n"
     ]
    },
    {
     "data": {
      "text/plain": "Completion(model='glm-4v', created=1714304992, choices=[CompletionChoice(index=0, finish_reason='stop', message=CompletionMessage(content='The woman in the image has long, flowing hair that appears to be a pale pink or blush color. Her dress is dark, likely black or a very deep navy blue.', role='assistant', tool_calls=None))], request_id='8611192515567253881', id='8611192515567253881', usage=CompletionUsage(prompt_tokens=1706, completion_tokens=37, total_tokens=1743))"
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "response = client.chat.completions.create(\n",
    "    model=\"glm-4v\",\n",
    "    messages=messages,\n",
    "    temperature=0.7,\n",
    "    top_p=0.9\n",
    ")\n",
    "response"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-28T11:49:52.986585Z",
     "start_time": "2024-04-28T11:49:50.231225Z"
    }
   },
   "id": "f57b9cae71780d3"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
